Towards Systematic Parallel Programming over MapReduce

نویسندگان

  • Yu Liu
  • Zhenjiang Hu
  • Kiminori Matsuzaki
چکیده

MapReduce is a useful and popular programming model for data-intensive distributed parallel computing. But it is still a challenge to develop parallel programs with MapReduce systematically, since it is usually not easy to derive a proper divide-and-conquer algorithm that matches MapReduce. In this paper, we propose a homomorphism-based framework named Screwdriver for systematic parallel programming with MapReduce, making use of the program calculation theory of list homomorphisms. Screwdriver is implemented as a Java library on top of Hadoop. For any problem which can be resolved by two sequential functions that satisfy the requirements of the third homomorphism theorem, Screwdriver can automatically derive a parallel algorithm as a list homomorphism and transform the initial sequential programs to an efficient MapReduce program. Users need neither to care about parallelism nor to have deep knowledge of MapReduce. In addition to the simplicity of the programming model of our framework, such a calculational approach enables us to resolve many problems that it would be nontrivial to resolve directly with MapReduce.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming

The objective of this study is to verify the importance of the capabilities of cloud computing services in managing and analyzing big data in business organizations because the rapid development in the use of information technology in general and network technology in particular, has led to the trend of many organizations to make their applications available for use via electronic platforms hos...

متن کامل

Generate, Test, and Aggregate - A Calculation-based Framework for Systematic Parallel Programming with MapReduce

MapReduce, being inspired by the map and reduce primitives available in many functional languages, is the de facto standard for large scale data-intensive parallel programming. Although it has succeeded in popularizing the use of the two primitives for hiding the details of parallel computation, little effort has been made to emphasize the programming methodology behind, which has been intensiv...

متن کامل

Calculating Edit Distance for Large Sets of String Pairs using MapReduce

Given two strings X and Y over a finite alphabet, the edit distance between X and Y , d(X,Y ) is the number of elementary edit operations required to edit X into Y . A dynamic programming algorithm elegantly computes this distance. In this paper, we investigate the parallelization of calculating edit distance for a large set of strings using MapReduce, a popular parallel computing framework. We...

متن کامل

Distributed Keyword Search over RDF via MapReduce

Non expert users need support to access linked data available on the Web. To this aim, keyword-based search is considered an essential feature of database systems. The distributed nature of the Semantic Web demands query processing techniques to evolve towards a scenario where data is scattered on distributed data stores. Existing approaches to keyword search cannot guarantee scalability in a d...

متن کامل

Datenbank-spektrum Schwerpunkt: Mapreduce Programming Model Compilation of Query Languages into Mapreduce Efficient or Hadoop: Why Not Both? Parallel Entity Resolution with Dedoop Inkrementelle Neuberechnungen in Mapreduce Fachbeitrag towards Integrated Data Analytics: Time Series Forecasting in Dbms

s publiziert/indexiert in Google Scholar, Academic OneFile, DBLP, io-port.net, OCLC, Summon by Serial Solutions. Hinweise für Autoren für die Zeitschrift Datenbank Spektrum finden Sie auf www.springer.com/13222. Datenbank Spektrum (2013) 13:1–3 DOI 10.1007/s13222-013-0116-z

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011